The R language has extensive graphical capabilities.
Graphics in R may be created by many different methods including base graphics and more advanced plotting packages such as lattice.
The ggplot2 package was created by Hadley Wickham and provides a intuitive plotting system to rapidly generate publication quality graphics.
ggplot2 builds on the concept of the “Grammar of Graphics” (Wilkinson 2005, Bertin 1983) which describes a consistent syntax for the construction of a wide range of complex graphics by a concise description of their components.
The structured syntax and high level of abstraction used by ggplot2 should allow for the user to concentrate on the visualisations instead of creating the underlying code.
On top of this central philosophy ggplot2 has:
Overview of example code for the ggplot2 scatter plot.
ggplot(data = <default data set>,
aes(x = <default x axis variable>,
y = <default y axis variable>,
... <other default aesthetic mappings>),
... <other plot defaults>) +
geom_scatter(aes(size = <size variable for this geom>,
... <other aesthetic mappings>),
data = <data for this point geom>,
stat = <statistic string or function>,
position = <position string or function>,
color = <"fixed color specification">,
<other arguments, possibly passed to the _stat_ function) +
scale_<aesthetic>_<type>(name = <"scale label">,
breaks = <where to put tick marks>,
labels = <labels for tick marks>,
... <other options for the scale>) +
ggtitle("Graphics/Plot")+
xlab("Weight")+
ylab("Height")+
theme(plot.title = element_text(colour = "gray"),
... <other theme elements>)Actual code for the ggplot2 scatter plot.
ggplot(data=patients_clean,
aes(y=Weight,x=Height,colour=Sex,
size=BMI,shape=Pet)) +
geom_point()As seen above, in order to produce a ggplot2 graph we need a minimum of:-
pcPlot$data[1:4,]## ID Name Race Sex Smokes Height Weight Birth
## 1 AC/AH/001 Michael White Male Non-Smoker 182.87 76.57 1972-02-06
## 2 AC/AH/017 Derek White Male Non-Smoker 179.12 80.43 1972-06-15
## 3 AC/AH/020 Todd Black Male Non-Smoker 169.15 75.48 1972-07-09
## 4 AC/AH/022 Ronald White Male Non-Smoker 175.66 94.54 1972-08-17
## State Pet Grade Died Count Date.Entered.Study Age BMI
## 1 Georgia Dog 2 FALSE 0.01 2015-12-01 44 22.90
## 2 Missouri Dog 2 FALSE -1.31 2015-12-01 43 25.07
## 3 Pennsylvania None 2 FALSE -0.17 2015-12-01 43 26.38
## 4 Florida Cat 1 FALSE -1.10 2015-12-01 43 30.64
## Overweight
## 1 FALSE
## 2 TRUE
## 3 TRUE
## 4 TRUE
Within this gg/ggplot object the data has been defined.
The information to map the data to the plot can be added now using the aes() function.
pcPlot <- ggplot(data=patients_clean)
pcPlot <- pcPlot+aes(x=Height,y=Weight)
pcPlot$mapping## Aesthetic mapping:
## * `x` -> `Height`
## * `y` -> `Weight`
pcPlot$theme## list()
pcPlot$layers## list()
Below the geom_point function is used to specify a point plot, a scatter plot of Height values on the x-axis versus Weight values on the y values.
pcPlot <- ggplot(data=patients_clean)
pcPlot <- pcPlot+aes(x=Height,y=Weight)
pcPlot <- pcPlot+geom_point()pcPlot$mapping## Aesthetic mapping:
## * `x` -> `Height`
## * `y` -> `Weight`
pcPlot$theme## list()
pcPlot$layers## [[1]]
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
More typically, the data and aesthetics are defined within ggplot function and geoms applied afterwards.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))
pcPlot+geom_point()As we have seen, an important element of a ggplot is the geom used. Following the specification of data, the geom describes the type of plot used.
Several geoms are available in ggplot2:-
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))
pcPlot_smooth <- pcPlot+geom_smooth()
pcPlot_smoothpcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex))
pcPlot_bar <- pcPlot+geom_bar()
pcPlot_barpcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height))
pcPlot_hist <- pcPlot+geom_histogram()
pcPlot_histpcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height))
pcPlot_density <- pcPlot+geom_density()
pcPlot_densitypcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex,y=Height))
pcPlot_violin <- pcPlot+geom_violin()
pcPlot_violinAn overview of geoms and thier arguments can be found at ggplot2 documentation or within the ggplot2 cheatsheet.
In order to change the property on an aesthetic of a plot into a constant value (e.g. set colour of all points to red) we can supply the color argument to the geom_point() function.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))
pcPlot+geom_point(color="red").pull-left[
As we discussed earlier however, ggplot2 makes use of aesthetic mappings to assign variables in the data to the properties/aesthetics of the plot. This allows the properties of the plot to reflect variables in the data dynamically.
In these examples we supply additional information to the aes() function to define what information to display and how it is represented in the plot. ] .pull-right[
First we can recreate the plot we saw earlier.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,
y=Weight))
pcPlot+geom_point() ]
Similarly the shape of points may be adjusted.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight,shape=Sex))
pcPlot+geom_point()The aesthetic mappings may be set directly in the geom_points() function as previously when specifying red. This can allow the same ggplot object to be used by different aesethetic mappings and varying geoms
pcPlot <- ggplot(data=patients_clean)pcPlot+geom_point(aes(x=Height,y=Weight,colour=Sex))pcPlot+geom_point(aes(x=Height,y=Weight,colour=Smokes))pcPlot+geom_point(aes(x=Height,y=Weight,colour=Smokes,shape=Sex))pcPlot+geom_violin(aes(x=Sex,y=Height,fill=Smokes))Again, for a comprehensive list of parameters and aesthetic mappings used in geom_type functions see the ggplot2 documentation for individual geoms by using ?geom_type
?geom_pointor visit the ggplot2 documentations pages and cheatsheet
One very useful feature of ggplot is faceting. This allows you to produce plots subset by variables in your data.
To facet our data into multiple plots we can use the facet_wrap or facet_grid function specifying the variable we split by.
The facet_grid function is well suited to splitting the data by two factors.
Here we can plot the data with the Smokes variable as rows and Sex variable as columns.
facet_grid(Rows~Columns)
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,
colour=Sex))+geom_point()
pcPlot + facet_grid(Smokes~Sex)To split by one factor we can apply the facet_grid() function ommiting the variable before the “~”" to facet along columns in plot.
facet_grid(~Columns)
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,
colour=Sex))+geom_point()
pcPlot + facet_grid(~Sex)To split along rows in plot, the variable is placed before the “~.”.
facet_grid(Rows~.)
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,
colour=Sex))+geom_point()
pcPlot + facet_grid(Sex~.)For more complex faceting both facet_grid and facet_wrap can accept combinations of variables. Here we use facet_wrap.
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,
colour=Sex))+geom_point()
pcPlot + facet_wrap(~Pet+Smokes+Sex)Or in a nice grid format using facet_grid() and the Smokes variable against a combination of Gender and Pet.
pcPlot + facet_grid(Smokes~Sex+Pet)Here, R decides the order to arrange the boxes according to the levels of the categorical variable. By default this is the alphabetical order. i.e. Female before Male.
summary(patients_clean$Sex)## Female Male
## 55 45
Scales and their legends have so far been handled using ggplot2 defaults. ggplot2 offers functionality to have finer control over scales and legends using the scale methods.
Scale methods are divided into functions by combinations of
the aesthetics they control.
the type of data mapped to scale.
scale_aesthetic_type
Try typing in scale_ then tab to autocomplete. This will provide some examples of the scale functions available in ggplot2.
Both continous and discrete X/Y scales can be controlled in ggplot2 using the
scale_(x/y)_(continous/discrete)
Similary control over discrete scales is shown below.
pcPlot <- ggplot(data=patients_clean,aes(x=Sex,y=Height))
pcPlot +
geom_violin(aes(x=Sex,y=Height)) +
scale_x_discrete(labels=c("Women", "Men"))Multiple X/Y scales can be combined to give full control of axis marks.
pcPlot <- ggplot(data=patients_clean,aes(x=Sex,y=Height,fill=Smokes))
pcPlot +
geom_violin(aes(x=Sex,y=Height)) +
scale_x_discrete(labels=c("Women", "Men"))+
scale_y_continuous(breaks=c(160,180),labels=c("Short", "Tall"))When using fill,colour,linetype, shape, size or alpha aesthetic mappings the scales are automatically selected for you and the appropriate legends created.
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=Sex))
pcPlot + geom_point(size=4)Manual control of discrete variables can be performed using scale_aes_Of_Interest_manual with the values parameter. Additionally in this example an updated name for the legend is provided.
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=Sex))
pcPlot + geom_point(size=4) +
scale_color_manual(values = c("Green","Purple"),
name="Gender")Here we have specified the colours to be used (hence the manual) but when the number of levels to a variable are high this may be impractical and often we would like ggplot2 to choose colours from a scale of our choice.
The brewer set of scale functions allow the user to make use of a range of palettes available from colorbrewer.
BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn, Spectral
Accent, Dark2, Paired, Pastel1, Pastel2, Set1, Set2, Set3
Blues, BuGn, BuPu, GnBu, Greens, Greys, Oranges, OrRd, PuBu, PuBuGn, PuRd, Purples, RdPu, Reds, YlGn, YlGnBu, YlOrBr, YlOrRd
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=Pet))
pcPlot + geom_point(size=4) +
scale_color_brewer(palette = "Set2")For more details on palette sizes and styles visit the colorbrewer website and ggplot2 reference page.
So far we have looked a qualitative scales but ggplot2 offers much functionality for continuous scales such as for size, alpha (transparancy), colour and fill.
scale_alpha_continuous() - For Transparancy
scale_size_continuous() - For control of size.
Below the range of sizes to be used in plot is limited to between 3 and 6
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,size=BMI))
pcPlot + geom_point(alpha=0.8) +
scale_size_continuous(range = c(3,6))The limits of the scale can also be controlled but it is important to note data outside of scale is removed from plot.
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,size=BMI))
pcPlot + geom_point() + scale_size_continuous(range = c(3,6),
limits = c(25,40))Control of colour/fill scales can be best achieved through the gradient subfunctions of scale.
scale_(colour/fill)_gradient - 2 colour gradient (eg. low to high BMI)
scale_(colour/fill)_gradient2 - Diverging colour scale with a midpoint colour (e.g. Down, No Change, Up)
Both functions take a common set of arguments:-
Similarly we can use the scale_colour_gradient2 function which allows for the specification of a midpoint value and its associated colour.
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=BMI))
pcPlot + geom_point(size=4,alpha=0.8) +
scale_colour_gradient2(low = "Blue",mid="Black", high="Red",
midpoint = median(patients_clean$BMI))As with previous continous scales, limits and custom labels in scale legend can be added.
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=BMI))
pcPlot + geom_point(size=4,alpha=0.8) +
scale_colour_gradient2(low = "Blue",
mid="Black",
high="Red",
midpoint = median(patients_clean$BMI),
breaks=c(25,30),labels=c("Low","High"),
name="Body Mass Index")Multiple scales may be combined to create high customisable plots and scales
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=BMI,shape=Sex))
pcPlot + geom_point(size=4,alpha=0.8)+
scale_shape_discrete(name="Gender") +
scale_colour_gradient2(low = "Blue",mid="Black",high="Red",
midpoint = median(patients_clean$BMI),
breaks=c(25,30),labels=c("Low","High"),
name="Body Mass Index")The stat_smooth() function can be used to fit a line to the data being displayed.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))
pcPlot+geom_point()+stat_smooth()By default a “loess” smooth line is plotted by stat_smooth. Other methods available include lm, glm,gam,rlm.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))
pcPlot+geom_point()+stat_smooth(method="lm")A useful feature of ggplot2 is that it uses previously defined grouping when performing smoothing.
If colour by Sex is an aesthetic mapping then two smooth lines are drawn, one for each sex.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height,colour=Sex))
pcPlot+geom_point()+stat_smooth(method="lm")This behaviour can be overridden by specifying an aes within the stat_smooth() function and setting inherit.aes to FALSE.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height,colour=Sex))
pcPlot+geom_point()+stat_smooth(aes(x=Weight,y=Height),method="lm",
inherit.aes = F)Another useful method is stat_summary() which allows for a custom statistical function to be performed and then visualised.
The fun.y parameter specifies a function to apply to the y variables for every value of x.
In this example we use it to plot the quantiles of the Female and Male Height data
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex,y=Height))+geom_jitter()
pcPlot+stat_summary(fun.y=quantile,geom="point",
colour="purple",size=8)Themes specify the details of data independent elements of the plot. This includes titles, background colour, text fonts etc.
The graphs created so far have all used the default themes, theme_grey(), but ggplot2 allows for the specification of theme used. — ## Predefined themes
Predefined themes can be applied to a ggplot2 object using a family of functions theme_style()
In the example below the minimal theme is applied to the scatter plot seen earlier.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+theme_minimal()Several predifined themes are available within ggplot2 including:
theme_bw
theme_classic
theme_dark
theme_gray
theme_light
theme_linedraw
theme_minimal
Packages such as ggthemes also contain many useful collections of predined theme_style functions.
and 5 groups of related elements:-
These elements may be specified by the use of their appropriate element functions including:
and additionally element_blank() to set an element to “blank”
A detailed description of controlling elements within a theme can be seen at the ggplot2 vignette and by typing ?theme into the console.
If we wished to set the y-axis label to be at an angle we can adjust that as well.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot + theme(text = element_text(colour="red"),
axis.text = element_text(colour="red"),
axis.title.y = element_text(angle=0))Finally we may wish to remove axis line, set the background of plot panels to be white and give the strips (title above facet) a cyan background colour.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+
geom_point()+
facet_grid(Sex~Smokes)
pcPlot+
theme(
text = element_text(colour="red"),
axis.text = element_text(colour="red"),
axis.title.y = element_text(angle=0),
axis.line = element_line(linetype = 0),
panel.background=element_rect(fill="white"),
strip.background=element_rect(fill="cyan")
)Finally we may wish to remove axis line, set the background of plot panels to be white and give the strips (title above facet) a cyan background colour.
A useful example of using the theme can be seen in controlling the legend. By default the legend is in right of plot.
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,
colour=Sex))+geom_point()
pcPlotWe can control all aspects of a legend as we can for other theme elements.
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,
colour=Sex))+geom_point()
pcPlot+theme(legend.text = element_text(colour="darkred"),
legend.title = element_text(size=20)
)In the example below, we maintain all elements set by theme_bw() but overwrite the theme element attribute of the colour of text.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+
theme_bw()+
theme(text = element_text(colour="red"))The consequence can be seen comparing the effect of theme() on a plot with a default theme or theme_minimal.
Since the default theme, theme_grey() contains a specification for axis.text colour, i will not replace it with “+” operator.
pcPlot+
theme(text = element_text(colour="red"))
pcPlot+
theme_minimal()+
theme(text = element_text(colour="red"))In contrast %+replace% replaces all elements within a theme regardless of whether they have been previously specfied in old theme.
When using the %+replace% operator
Theme elements specified in new scheme replace elements in old theme
Theme elements in the old theme which have not been specified in new theme are also replaced by blank theme elements.
oldTheme <- theme_bw()
newTheme_Plus <- theme_bw() +
theme(text = element_text(colour="red"))
newTheme_Replace <- theme_bw() %+replace%
theme(text = element_text(colour="red")) oldTheme$text## List of 11
## $ family : chr ""
## $ face : chr "plain"
## $ colour : chr "black"
## $ size : num 11
## $ hjust : num 0.5
## $ vjust : num 0.5
## $ angle : num 0
## $ lineheight : num 0.9
## $ margin :Classes 'margin', 'unit' atomic [1:4] 0 0 0 0
## .. ..- attr(*, "valid.unit")= int 8
## .. ..- attr(*, "unit")= chr "pt"
## $ debug : logi FALSE
## $ inherit.blank: logi TRUE
## - attr(*, "class")= chr [1:2] "element_text" "element"
newTheme_Plus$text## List of 11
## $ family : chr ""
## $ face : chr "plain"
## $ colour : chr "red"
## $ size : num 11
## $ hjust : num 0.5
## $ vjust : num 0.5
## $ angle : num 0
## $ lineheight : num 0.9
## $ margin :Classes 'margin', 'unit' atomic [1:4] 0 0 0 0
## .. ..- attr(*, "valid.unit")= int 8
## .. ..- attr(*, "unit")= chr "pt"
## $ debug : logi FALSE
## $ inherit.blank: logi FALSE
## - attr(*, "class")= chr [1:2] "element_text" "element"
newTheme_Replace$text## List of 11
## $ family : NULL
## $ face : NULL
## $ colour : chr "red"
## $ size : NULL
## $ hjust : NULL
## $ vjust : NULL
## $ angle : NULL
## $ lineheight : NULL
## $ margin : NULL
## $ debug : NULL
## $ inherit.blank: logi FALSE
## - attr(*, "class")= chr [1:2] "element_text" "element"
So far no plot titles have been specified. Plot titles can be specified using the labs functions.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+labs(title="Weight vs Height",y="Height (cm)")or specified using the ggtitle and xlab/ylab functions.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))+geom_point()
pcPlot+ggtitle("Weight vs Height")+ylab("Height (cm)")Plots produced by ggplot can be saved from the interactive viewer as with standard plots.
The ggsave() function allows for additional arguments to be specified including the type, resolution and size of plot.
By default ggsave() will use the size of your current graphics window when saving plots so it may be important to specify width and height arguments desired.
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
ggsave(pcPlot,filename = "anExampleplot.png",width = 15,
height = 15,units = "cm")